2023-03-13

Linear Regression

Linear regression is defined as an algorithm that provides a linear relationship between an independent variable and a dependent variable to predict the outcome of future events. It is a statistical method used in data science and machine learning for predictive analysis.

Regression Line:

\(E(y) = \beta_0 + \beta_1 x_1\)

\(var(y) = \sigma ^2\)

Least Squares Line:

\(\hat{y} = a + bx\)

Dataset mtcars

In this data set we load the data that was extracted from the 1974 Motor Trend US magazine.

data(mtcars)
df = mtcars[1:51,]
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Data set provides comparison fuel consumption and 10 aspects of automobile design and performance for 32 automobiles

Consider the following plot:

ggplot(data = df, aes(x = wt , y = mpg)) + 
  geom_point(colour = "red", na.rm = TRUE)

Analyzing the plot

In the following slide we will apply equations for linear regressions. Our independent value is weight of the cars and will use it to create a predictive model of miles per gallon. The data set is relatively a small data set with small number of observations, it appears to be a linear relationship between the weight and miles per gallon of cars.

\(SSResid = \sum(y - \hat{y})^2 = \sum y^2 - a \sum y - b \sum xy\)

Plot with linear regression

ggplot(data = df, aes( x = wt , y = mpg)) +
  geom_point(color ='red') + 
  geom_smooth(method = "lm" , se = FALSE)

3D plotly

We use the plotly to create a three dimensional visualization for Displacement vs mpg vs number of cylinders.

plot_ly(df, x=~wt, y=~mpg, z=~cyl) %>% add_markers(size = 1.5)

3D plot with linear regression